Attrasoft image retrieval

ABSTRACT

A system, methods, and algorithm for content-based image retrieval and recognition system, useful in all types images and image formats. An image(s) or an image segment(s), which is specified by the user in two clicks (the first in the upper-left corner and the second in the bottom-right corner), specifies the content-based sample. The sample image(s) is used to teach the system what to look for via the ABM (Attrasoft Boltzmann Machine) algorithm and APN (Attrasoft PolyNet) algorithm; the system then searches through one or many directories, which is specified by the user, and presents the research results. The search result consists of pairs matched image and a Weight (score), which specifies the similarity between the sample and matching images. These weights are also being used to classify images in the cases of the classification problem. The users are able to view the retrieved images in the result via a single click. When the algorithm is implemented as a software component, the system integration will follow the specification of the “Attrasoft Image Verification and Identification Application Programming Interface (IVI-API)”.

TECHNICAL FIELD

[0001] The present invention relates generally to image retrieval and image recognition, and more particularly related to a system, methods, and algorithms of content-based image retrieval and recognition system. Within such a system, the image(s) to be retrieved/recognized is not preprocessed with the association of key words (meta-data). This system allows the user of an image retrieval/recognition system, such as software together with a computer, network server, or web server etc, to define a searching criteria by using an image(s), a segment of an image(s), a directory containing images or combinations of the above. This system will return the result, which contains pairs of the matched image and similarity. The user can see the matched images in a single click.

[0002] This invention can be used in image verification (1:1 matching, binary output: yes/no), image identification (1:N matching, single output to indicate a classification), image search or retrieval (1:N matching, multiple output), and image classification (N: 1 or N:N matching). For simplicity, we will only use the word, retrieval.

BACKGROUND OF THE INVENTION

[0003] In certain types of content-based images retrieval/recognition systems, the central task of the management system is to retrieve images that meet some specified constraints.

[0004] Most image-retrieval methods are limited to the keyword-based approach. In this approach, keywords and the images together form a record in a table. The retrieval is based on the keywords in much the same way as the relational database. (Example: Microsoft Access).

[0005] The user operation is generally divided into two phases: the learning phase and the search/recognition phase. In the learning phase, various types of processes, such as image preprocessing and image filtering are applied to the images. Then the images are sent to a recognition module to teach the module the characteristics of the image. The learning module can use various algorithms to learn the sample image. In the search/retrieval phase, the recognition module decides the classification of an image in a search directory or a search database.

[0006] A very small number of commercially available products exist which perform content-based image retrieval.

[0007] Informix Internet Foundation.2000 is an object-relational database management system (ORDBMS), which supports non-alphanumeric data types (objects). 11F2000 supports several DataBlade modules including the Excalibur Image DataBlade module to extend its retrieval capabilities. DataBlade modules are server extensions that are integrated into the core of the database engine. The Excalibur Image DataBlade is based on technology from Excalibur Technologies Corporation, and is co-developed and co-supported by Informix and Excalibur. The core of the DataBlade is the Excalibur Visual retrievalWare SDK. The Image DataBlade module provides image storage, retrieval, and feature management for digital image data. This includes image manipulation, I/O routines, and feature extraction to store and retrieve images by their visual contents. An Informix database can be queried by aspect ratio, brightness, global colour, local colour, shape, and texture attributes. An evaluation copy of 11F2000 and the Excalibur Image DataBlade module can be downloaded from www.informix.com/evaluate/.

[0008] IMatch is a content-based image retrieval system developed for the Windows operating system. The software was developed by Mario M. Westphal and is available under a shareware license. IMatch can query an image database by the following matching features: colour similarity, colour and shape (Quick), colour and shape (Fuzzy), colour percentage, and colour distribution. A fully functional 30-day evaluation copy is available for users to assess the software's capabilities and can be downloaded from www.mwlabs.de/download.htm The shareware version has a 2000 limit on the number of images that can be added to a database. A new version of the software was released on the Feb. 18, 2001.

[0009] The Oracle8i Enterprise Server is an object relational database management system that includes integral support for BLOBs. This provides the basis for adding complex objects, such as digital images, to Oracle databases. The Enterprise release of the Oracle database server includes the Visual Information retrieval (VIR) data cartridge developed by Virage Inc. OVIR is an extension to Oracle8i Enterprise Server that provides image storage, content-based retrieval, and format conversion capabilities through an object type. An Oracle database can be queried by global color, local color, shape, and texture attributes. An evaluation copy of the Oracle8i Enterprise Server can be downloaded from otn.oracle.com.

SUMMARY OF THE INVENTION

[0010] The present invention is different from Informix database where images can be queried by aspect ratio, brightness, global colour, local colour, shape, and texture attributes. The present invention is different from Imatch where images can be queried by colour similarity, colour and shape (Quick), colour and shape (Fuzzy), colour percentage, and colour distribution. The present invention is different from the Oracle8i Enterprise Server where images can be queried by color, local color, shape, and texture attributes.

[0011] The present invention is unique in its sample image, control process, control parameters, and algorithms. The current algorithms do not use methodologies deployed in the above systems. In particular, the following parameters are not used: aspect ratio, brightness, global colour, local colour, shape, colour similarity, colour and shape (Quick), colour and shape (Fuzzy), colour percentage, and colour distribution, local color, shape, and texture attributes. The present invention has nothing in common with any existing system. Even the current invention is applied to images, the algorithms in the invention can be applied to other types of data, such as sound, movie, . . . .

[0012] 1. Process

[0013] The present invention is a content-based image retrieval/recognition system, where users specify an image(s) or segment(s); adjust control parameters of the system, and query for all matching images from an image directory or database. The user operation is generally divided into two phases: learning phase and search/recognition phase. In the learning phase, various types of processes, such as image preprocessing, image size reduction, and image filtering are applied to the images. Then the images are send to a recognition module to teach the module the characteristics of the image as specified by an array of pixels. Each pixel is defined by an integer, which can have any number of bits. The learning module can use ABM or APN learning algorithms to learn the sample image. Both the algorithms will be listed in the present invention.

[0014] In the search/retrieval phase, the recognition module decides the classification of an image in a search directory or a search database.

[0015] In a retrieval/recognition system, a “training” for the system or “learning” by the system is to teach the system what characteristics of an image, or a segment of an image (key) to look for. A system operator completes this step by specifying the sample image(s); specifying the parameters and clicking one button, the “training” button, which appears in the graphical user interface of the system. A “retraining” by the system is to teach the system what characteristics of images to look for, after the system is already trained. Training and retraining together allows the system to learn from many sample image(s) and segment(s) simultaneously.

[0016] A “search” or “retrieval” is to look for matching images from an image source such as, directory, many directories, subdirectories, network, Internet, or database, etc. A system operator completes this step by specifying the image source such as search directory(s), specifying the parameters and clicking one button, the “searching” button, which appears in the graphical user interface of the system. The results can be displayed within the software systems or displayed in a program created by the system. Two particular applications are image verification (1:1 matching, binary output: yes/no) and image identification (1:N matching, single output to indicate a classification).

[0017] A “classification” or “recognition” is to repeat training and search for each category of images. At the end, a system operator clicks one button, the “classification” button, which appears in the graphical user interface of the system. The results can be displayed within the software systems or displayed in a program created by the system. Classification is an N: N matching with a single output to indicate a classification.

[0018] The parameters and settings of a particular operation can be saved and recalled later. Clicking a button, cut and paste, open files, or typing can achieve recalling a saved operation. The saved results are called “batch code”. The “Batch” buttons provide means to execute these saved batch codes.

[0019] A “process” is a sequence of training and searching, or a classification, or a specification of a batch code and execution of a batch code. They are further divided into a search process, a classification process, and a batch process.

[0020] After the operator completes a process, the results consists of a list of pairs; the pairs consist of the matched image and the “weight”, which reflects how closely the selected image matches the sample image(s). This list can be sorted or unsorted. This list provides the link to the matched images so the match images can be viewed with a single click.

[0021] “System integration” is to combine a software component, which is an implementation of this invention, with an application interface.

[0022] The search process, which is applicable to retrieval, verification, and identification, is:

[0023] 1. Enter key image into the system;

[0024] 2. Set training parameters and click the training button to teach the system what to look for;

[0025] 3. Enter search-directory(s);

[0026] 4. Set search parameter(s), and click the search button;

[0027] 5. The system output is a list of names and weights:

[0028] The weight of an image is related to the characteristics you are looking for (the weight is similar to an Internet search engine weight);

[0029] Click the name of each image and an image will pop up on the screen.

[0030]FIG. 1 is the flow chart version of this algorithm.

[0031] The classification process is:

[0032] 1. Enter key image into the system;

[0033] 2. Set training parameters and click the training button to teach the system what to look for;

[0034] 3. Enter search-directory(s);

[0035] 4. Set search parameter(s), and click the search button;

[0036] 5. Repeat the above process for each class and then click the “Record” button. At the end, click the “Classification” button. The output web page will first list the sample images for each class. Then it will list:

[0037] An image link for each image in the search directory;

[0038] The classification weights of this image in each search; and

[0039] The classification of this image as a link.

[0040] The batch process is:

[0041] 1. Provide the batch code to the system, which includes:

[0042] Click the save button to save the current setting, including key(s), search directory(s), and parameters into a batch code.

[0043] Click a file button to recall one of the many batch codes saved earlier.

[0044] Cut and paste or simply type in a batch code by keyboard.

[0045] 2. Click batch button to execute the code.

[0046] An integration process is to combine a software component, which is an implementation of this invention, with an application interface. This invention also specifies a user-graphical-interface for the integration.

[0047] 2. Parameters

[0048] The search, classification, and batch processes require a set of parameters. All the parameters can be specified in the system user interface, either through clicking buttons or through Windows. The parameters are specially related to the ABM and APN algorithms, which will be claimed in this patent.

[0049] The “Area of Interest” specifies an image segment, which is specified by 4 numbers: the coordinates of the upper-left corner and the bottom-right corner.

[0050] The “internal representation” specifies the dimensions of a pixel array used for computation, which may or may not be the actual image pixel array.

[0051] The “Background” or “Background filter” selects an image-processing filter the pixel array must pass through before entering the learning component of the system.

[0052] The “Symmetry” represents similarity under certain types of changes, such as intensity, translation symmetry, Scaling, Rotation, oblique, combined rotation and scaling or any combination thereof.

[0053] The “Rotation Types” specify the range of rotation if the rotation symmetry is used. Examples are 360°-rotations, −5° to 5° rotations, and −10° to 10° rotations, or other settings that fit the user's need.

[0054] The “Reduction Type” specifies the method used when reducing a large image pixel array to a smaller pixel array.

[0055] The “Sensitivity” deals with the sample segment size; high sensitivity is for small segment(s) and low sensitivity is for large segment(s).

[0056] The “Blurring” measures the distortion due to data compression, translation, rotation, scaling, intensity change, and image format conversion.

[0057] The “Shape Cut” is to eliminate many images that have different shapes from the sample segment.

[0058] The “External Weight Cut” is to list only those retrieved images with weights greater than a certain value. The weight Cut is an integer greater than or equal to 0. There is no limit how large this integer can be. The “Internal Weight” Cut plays a similar role as the External Cut in a percent value rather than an absolute weight value.

[0059] The “Image Type” specifies the learning component whether to treat the pixel array as black and white images or a color image. It also instructs the learning component whether to use a maximum value, integration, or both.

[0060] The “L/S Segment” (Large/Small segment) specifies the system where to focus when searching images.

[0061] The “Short/Long” search specifies an image source such as whether to search one directory or many directories.

[0062] The “Short Cut” is a Scrollbar to select an integer between 0 and 99; each integer is mapped to a set of predefined settings for the parameters.

[0063] The “Border Cut” controls the portions of images to be used in the image recognition.

[0064] The “Segment Cut” controls the threshold used to reduce an image into an internal representation.

[0065] 3. System Layout

[0066] Attrasoft Component-Object structure consists of three layers (See FIG. 2):

[0067] Application Layer

[0068] Presentation Layer

[0069] ABM Network Layer

[0070] The ABM Network Layer has two algorithms to be claimed in the present invention:

[0071] ABM (Attrasoft Boltzmann Machine);

[0072] Attrasoft PolyNet (APN): multi-valued ABM.

[0073] This layer is responsible for learning and classification.

[0074] The Presentation Layer is an interface between the ABM net layer and the user interface layer. There are two types of data used by the systems: user data or application data, and ABM neural data. ABM networks use ABM neural data. User data depends on the application. The presentation layer converts the image data into neural data used by the ABM layer component.

[0075] The Application Layer is the front-end graphical user interface, which the users see directly. This layer collects all parameters required for necessary computation.

[0076] 4. Algorithms

[0077] The ABM layer deploys two algorithms, ABM and APN. The ABM and APN algorithms consist of a combination of Markov Chain Theory and the Neural Network theory. Both theories are well known. The ABM and APN algorithms are newly invented algorithms, which have never been published.

[0078] The following terms are well known: Markov chain, state of Markov chain, invariant distribution.

[0079] The basic flow chart for ABM and APN algorithms are:

[0080] 1. Combine an image and its classification into a vector.

[0081] 2. All such together form a mathematical configuration space. Each point in such a space is called a state.

[0082] 3. A Markov chain exists in such a space where the state of the configuration space is a state of the Markov chain.

[0083] 4. The Markov chain will settle on its invariant distribution. A distribution function is deployed to describe such a distribution. In particular, such distribution function classifies the images.

[0084] 5. The construction of such a Markov chain is by a particular type of neural network, called ABM network or APN network. This type of neural net satisfies 3 features: (1) fully connected; (2) the order of the neural net is the same as the number of neurons in the network, i.e. the number of connections is an exponential function of the number of neurons; and (3) the connections follow particular algorithms, known as ABM and APN algorithms.

[0085] The Step 4 of the above is defined as follows:

[0086] Let x be an image, and let a, b be two classes; then the two possible vectors are (x, a) and (x, b).

[0087] Let a distribution function be z=F (y), where y is a vector. If y=(x, a), z=z1; and y=(x, b), z=z2, then the probability of x in class a is z1 and the probability of x in class b is z2. The result will be {(x, a, z1), (x, b, z2)}. The users will see results like this directly in the output of the system.

[0088] In the ABM or APN algorithms, content-based image retrieval and image recognition are basically the same problem; therefore, they can be converted from one to the other. To convert from an image search problem to an image recognition problem, one query is required for each class. To see whether an image, say B, is in class A, you first train ABM with all images in class A, then try to retrieve image B. If image B is not retrieved, then image B is not in class A. If image B is retrieved only for class A, then image B is in class A. If image B is retrieved for several classes, the class with the largest relative probability is the one to which image B belongs. Image search is an image classification problem with only 1 class.

[0089] ABM is a binary network. APN is a multi-valued network.

[0090] 5. Components and Application-Programming Interface

[0091] Software components can be isolated to be attached to different front-end systems. This can be done with ABM neural layer alone, or both ABM layer and presentation layer. The ABM layer component is a core of the present invention. The value of such a sub-system is the same as the whole system.

[0092] This invention also defines the application-programming interface (API), which specifies the system integration. This API is called IVI-API.

BRIEF DESCRIPTION OF VIEWS OF THE DRAWING

[0093]FIG. 1 shows the algorithm of the Search Process, which is applicable for image verification, identification, and retrieval.

[0094]FIG. 2 shows a 3-Layer Internal Architecture.

[0095]FIG. 3 shows a sample User Interface of the Present Invention.

[0096]FIG. 4 shows a sample Key Input for the Present Invention.

[0097]FIG. 5 shows a sample Search Output of the Present Invention. The search output is a list of pairs.

[0098]FIG. 6 shows a sample Classification output of the Present Invention. The classification output is a list of triplets.

DETAILED DESCRIPTION OF THE DISCLOSED EMBODIMENT

[0099] Preferred Embodiment of the Search System

[0100] An image search/classification constructed in accordance with the preferred embodiment comprises a computer-based workstation including monitor, keyboard and mouse, a content-based image retrieval software system and a source of images.

[0101] The source of the images may be on the local drive, network or the Internet. The source is connected to the workstation. The source of images may be accessed directly via open files, or indirectly, such as going into a file to find the images or going into a database application to find the images, etc.

[0102] The preferred workstation can be a PC or any other type of computers, which connects to a data source.

[0103] The preferred content-based image retrieval software system is any software, which has ABM or APN algorithm as a component. It can be a Window-based system, or any other operating system based systems, or Internet based systems.

[0104] Overview of the ABM Algorithm

[0105] The following terms are well known: synaptic connection or connection. The basic flow chart for ABM algorithm is:

[0106] 1. Create an ABM net with no connections;

[0107] 2. Combine an image and its classification into an input vector.

[0108] 3. Impose the input vector to the learning module.

[0109] 4. The ABM neural connections are calculated based on the input vector. Let N be the number of neurons; the order of connections can be up to N and the number of connections can be 2**N, where ** represent the exponential function.

[0110] 5. The Markov chain is formed after the connections are established. This Markov chain will settle on its invariant distribution. A distribution function is deployed to describe such a distribution.

[0111] 6. This distribution function, once obtained, can be used to classify images. This will produce triplets of image, class, and weight. Image retrieval and classification are two different sides of the same token.

[0112] 7. These triplets of image, class, and weight can be viewed as the results of the classification process. For the search process, a doublet of image and weight are displayed. The second part of the triple is omitted because the search problem has only one class.

[0113] Overview of the APN Algorithm

[0114] The basic flow chart for APN algorithm is:

[0115] 1. Create an APN neural net with no connections;

[0116] 2. Combine an image and its classification into an input vector.

[0117] 3. Impose the input vector to the learning module.

[0118] 4. The APN neural connections are calculated based on the input vector. Let N be the number of neurons; the order of connections can be up to N and the number of connections can be 2* *N, where **represent the exponential function.

[0119] 5. A mapping over each connection is established. Let K be a number of neurons in a K order connection, where K is less than or equal to N, then this will be a K to K mapping, i.e. the domain of the mapping has K integers and the range of the mapping has K integers.

[0120] 6. The K-elements mapping is changed to N-element mapping by adding (N−K) pairs of 0 to 0 relations for each of the neurons not in the set K. By taking the domain of this mapping away, the range of this mapping forms a vector, APN connection vector.

[0121] 7. The Markov chain is formed after the connections are established. This chain will settle on its on its invariant distribution. A distribution function is deployed to describe such a distribution.

[0122] 8. This distribution function, once obtained, can be used classify images. This will produce triplets of image, class, and weight.

[0123] 9. Comparing the input-vector and the APN-connection-vector modifies this weight. This will produce a new set of triplets of image, classification, and weight.

[0124] 10. These triplets of image, class, and weight can be viewed as the results of the classification process. For the search process, a doublet of image and weight are displayed. The second part of the triple is omitted because the search problem has only one class.

[0125] User Interface Layer of software for implementation of ABM and APN Algorithms

[0126] There are three major operations:

[0127] Search or retrieval;

[0128] Classification; and

[0129] Batch.

[0130] These are the principle modes of the system that runs on the workstation. The software executed in these three modes can have various user interfaces, such as in Windows environment or the web environment, etc. The user interface collects necessary information for the computation.

[0131] Other than the key and the a source of images, the user interface may or may not pass the following information to the next layer:

[0132] The “Area of Interest” specifies an image segment by two clicks. These two clicks generate 4 numbers the coordinates of the upper-left corner and the bottom-right corner.

[0133] The “internal representation” specifies the dimensions of a pixel array used for computation, which may or may not be the actual image pixel array.

[0134] The “Background” or “Background filter” selects an image-processing filter the pixel array must pass through before entering the learning component of the system. The interface will be responsible for selecting one of many available filters.

[0135] The “Symmetry” represents similarity under certain types of changes, such as intensity, translation symmetry, Scaling, Rotation, oblique, combined rotation and scaling or any combination thereof For the translation symmetry, this is implemented by physically translating the sample image to all possible positions. The similar methods can be applied to other symmetries.

[0136] The “Rotation Types” specify the range of rotation if the rotation symmetry is used. Examples are 360′-rotations, −5° to 5° rotations, and −10° to 10° rotations, or other settings that fit the user's need.

[0137] The “Reduction Type” specifies the method used when reducing a large image pixel array to a smaller pixel array.

[0138] The “Sensitivity” deals with the sample segment size; high sensitivity is for small segment(s) and low sensitivity is for large segment(s). This is a method to limit the relevant neural connections. When ABM net, x1, is trained, there will be certain connections. All possible connections together form a space, H1. For the ABM net with N neurons, such a space will have a maximum of 2**N point, where ** is the exponential function. Each trained ABM net will have a set h1, representing non-zero connections. When deciding whether an image, I2, in a search directory is a match to the current sample image, I1, this image I2 can be turned around to train the new but similar ABM neural net, x2. The will generate a set of connections, h2. Similarity determines a maximum distance, d, either using the Hausdorff distance or L1 distance or L2 distance. In the connection space, starting from the connection set, h2, of the new ABM net, after applying this new distance, d, a new set, h3, is obtained. Obviously the smaller this distance, d, is, the smaller this new set, h3, will be. This new set, h3, is then transformed back to hi. Any point in hi but not in h3 will be considered “too far” and therefore is set to 0 for the current image, I2, in the search directory. This reduction in the connections space is determined by the sensitivity.

[0139] The “Blurring” measures the distortion due to data compression, translation, rotation, scaling, intensity change, and image format conversion. This method expands an image in the search directory from a single point to a set as follows. All possible images together form a space, the image space. An image is a point in such a space. When deciding whether an image, I2, in a search directory is a match to the current sample image, I1, this image I2 can be turned a small set around the I2. Let the set be IS2. Blurring determines a maximum distance, d, either using the Hausdorff distance or L1 distance or L2 distance. In the image space, starting from the I2, after applying this new distance, d, a new sphere set, IS2, is obtained. Obviously the smaller this distance, d, is, the smaller this new set, IS2, will be. Now any point in this set, IS2, is just as good as I2. This expansion in the image space is determined by the Blurring.

[0140] The “Shape Cut” is to eliminate many images that have different shapes from the sample segment. All possible images together form a space, the image space. An image is a point in such a space. When deciding whether an image, I2, in a search directory is a match to the current sample image, I1, the distance between I1 and I2, d, can be determined, either using the Hausdorff distance or L1 distance or L2 distance. If this distance, d, is larger than a predetermined distance, D, a mismatch can be declared without going through the ABM neural net. This predetermined distance, D, is set by the “Shape Cut” parameter.

[0141] The “External Weight Cut” is to list only those retrieved images with weights greater than a certain value. The weight Cut is an integer greater than or equal to 0. There is no limit how large this integer can be.

[0142] The “Internal Weight Cut” plays a similar role as the “External Cut” in a percent value rather than an absolute weight value.

[0143] The “Image Type” specifies the ABM or APN algorithm. It also instructs the neural layer component how to compute the weights. The weight can be computed by using the invariant function of the Markov chain, or integration all contributions in the time evolution of the Markov chain, with or without reaching the invariant distribution.

[0144] The “L/S Segment” (Large/Small segment) specifies the system where to focus when searching images. Please refer to the similarity to understand the set of contributing connections, i.e. not every connection is a contributing connection. Small and Large segments deploy different scales in the determining the set of connections.

[0145] The “Short/Long” search specifies an image source such as whether to search one directory or many directories.

[0146] The “Short Cut” is a Scrollbar to select an integer between 0 and 99; each integer is mapped to a set of predefined settings for the parameters.

[0147] The “Border Cut” is to eliminate the border sections of images. This parameter controls the percentage of images to be eliminated before entering consideration.

[0148] The “Segment Cut” is best illustrated by examples. Assume 1400×400 image is reduced to 100×100 internal representation, as set by the parameter “Internal Representation”; then 16 original pixels will be reduced into 1 pixel. The new value of the single pixel is determined by the parameter “Reduction Type”. The “Segment Cut’ sets a threshold: if the number of non-zero pixels is greater than the threshold, the pixel will have a non-zero value; otherwise, the pixel will have a zero value.

[0149] Presentation Layer of software for implementation of ABM and APN Algorithms

[0150] The presentation layer transforms the image data to neural data. The procedure includes:

[0151] 1. Open files from the image source;

[0152] 2. Decode the image into pixels arrays;

[0153] 3. Process images with a filter;

[0154] 4. Reduce the size of images to an internal representation. The users can arbitrarily choose the internal representation of the images. Such reduction can be based on individual images on a case-by-case reduction, or deploy the same reduction factor across to all images.

[0155] 5. In the case where many pixels in an image have to be combined into a new pixel before leaving this layer, the user can choose a reduction type such as taking average, maximum, minimum, or deploy a threshold.

[0156] 6. Pass the image array to the next layer.

[0157] ABM Layer of software for implementation of ABM and APN Algorithms

[0158] This Upper level of this layer has two branches:

[0159] Training Objects

[0160] High level training class

[0161] Low level training class and

[0162] Symmetry class

[0163] Recognition Objects

[0164] High level recognition class

[0165] Low level recognition class

[0166] This lower level of this layer has only one class, the memory management class.

[0167] The purpose of the memory management class is to claim memory space from RAM, 64K at a time. This memory space will be used for storing the connections. It also returns the unnecessary space back to the operating system of the computer.

[0168] The low level training object is to provide all necessary functions used by the high level training class.

[0169] The symmetry object is to implement the symmetry defined earlier.

[0170] The high level training class incorporates symmetry and implements the ABM or APN algorithm. The “image Type” parameter in the user interface will determine which algorithm will be use.

[0171] ABM Training Algorithm is:

[0172] 1. Delete the existing ABM connections;

[0173] 2. Combine an image and its classification into an input vector.

[0174] 3. The ABM neural connections are calculated based on the input vector. Let N is the number of neurons, these connections can be up to the order of N. The image is randomly breaking down into a predefined number of pieces.

[0175] 4. Let an image piece, p1, have K=(k1+k2) pixels, where K is an integer. After imposing the pixel vector to the ABM net, k1 is the number of neurons excited and k2 is the neurons of neurons grounded. A neural state vector can be constructed to represent such a configuration, which k1 components being 1 and k2 components being 0.

[0176] 5. All such vectors together form a space, the connection space. A distance, either the Hausdorff distance or L1 distance or L2 distance can be defined in this space. Such a definition of a distance allows all possible connection vectors to be classified via a distance from p1. Many vectors will be in a group with distance 1 from p1. Many vectors will be in a group with distance 2 from p1, . . . .

[0177] 6. The connection represented by p1 is assigned the largest synaptic connection weight. Those connections in the distance 1 group will have smaller weights, . . . . After a certain distance, the connection weights will be 0, or there will be no connections. The present invention covers all possible combinations of such a generating method.

[0178] 7. The Markov chain is formed after the connections are established.

[0179] APN Training Algorithm is:

[0180] 1. Delete the existing ABM connections;

[0181] 2. Combine an image and its classification into an input vector.

[0182] 3. The ABM neural connections are calculated based on the input vector. Let N is the number of neurons, these connections can be up to the order of N. The image is randomly breaking down into a predefined number of pieces.

[0183] 4. Let an image piece, p1, have K=(k1+k2) pixels, where K is an integer. After imposing the pixel vector to the ABM net, k1 is the number of neurons excited and k2 is the neurons of neurons grounded. A neural state vector can be constructed to represent such a configuration, which k1 components being 1 and k2 components being 0.

[0184] 5. All such vectors together form a space, the connection space. A distance, either the Hausdorff distance or L1 distance or L2 distance can be defined in this space. Such a definition of a distance allows all possible connection vectors to be classified via a distance from p1. Many vectors will be in a group with distance 1 from p1. Many vectors will be in a group with distance 2 from p 1, . . . .

[0185] 6. The connection represented by p1 is assigned the largest synaptic connection weight.

[0186] Those connections in the distance 1 group will have smaller weights, . . . . After a certain distance, the connection weights will be 0, or there will be no connections. The present invention covers all possible combinations of such a generating method.

[0187] 7. The Markov chain is formed after the connections are established.

[0188] 8. For each connection, in addition to the synaptic connection weight, a mapping over each connection is established. Let k1 be a number of neurons in the original k1 order connection generated by p1, then this mapping maps from the k1 neuron to the k1 pixel value which excited these neurons. This completes the connection for the original segment p1.

[0189] 9. The segment, p1, also generated many other connections. If a neuron in this connection is one of the original k1 neurons in p1, then this neuron is mapped into the corresponding pixel value, which causes this neuron to be excited; otherwise, this neurons is mapped into 0. This completes the mappings of all connections generated by this segment p1.

[0190] The low-level recognition object is to provide all necessary functions used by the high-level recognition class.

[0191] The high-level recognition class implements the ABM or APN algorithm. The “image Type” parameter in the user interface will determine which algorithm will be use.

[0192] ABM Recognition Algorithm is:

[0193] 1. An image to be classified is imposed on the Markov Chain.

[0194] 2. This Markov chain will settle on its invariant distribution. A distribution function is deployed to describe such a distribution.

[0195] 3. This distribution function, once obtained, can be used to classify images. This will produce triplets of image, class, and weight. Image retrieval and classification are two different sides of the same token.

[0196] 4. These triplets of image, classification, and weight can be viewed as the results of the classification process. For the search process, a doublet of image and weight are displayed. The second part of the triple is omitted because the search problem has only one class.

[0197] APN Recognition Algorithm is

[0198] 1. An image to be classified is imposed on the Markov Chain.

[0199] 2. This chain will settle on its on its invariant distribution. A distribution function is deployed to describe such a distribution.

[0200] 3. This distribution function, once obtained, can be used classify images. This will produce triplets of image, class, and weight.

[0201] 4. Comparing the input-vector and the APN-connection-vector modifies this weight. All connection vectors together forms a vector space. A distance, either L1 distance or L2 distance can be defined in this space. The basic idea is the new weight will be directly proportional to the old weight and inversely proportional to this distance. The present invention covers all functions of obtaining the new weight:

New weight=f (old weight, distance).

[0202]  This will produce a new set of triplets of image, classification, and weight.

[0203] 5. These triplets of image, classification, and weight can be viewed as the results of the classification process. For the search process, a doublet of image and weight are displayed. The second part of the triple is omitted because the search problem has only one class.

[0204] IVI-API (Image Verification and Identification Application Programming Interface)

[0205] A typical image matching application structure is:

[0206] GUI (graphical user interface) Layer

[0207] DBMS (database management system) Layer

[0208] UVI-API (image verification and identification API) Layer

[0209] SPI (Service Provider Interface) Layer

[0210] OS (Operating System) and Hardware Layer

[0211] The IVI-API is transparent for SPI (Service Provider Interface): the SPI functions will pass right through the GUI-API. The SPI can be accessed directly from layers above the IVI-API layer, i.e. the DBMS layer or GUI layer.

[0212] There are two main functions in API layer: verify and identify; and there is one main function in the SPI layer: capture.

[0213] The two top-level jobs for verification are Enrollment and Verify. The two top-level jobs for identification are Enrollment and Identify. The enrollment, in either case, is nothing but setting a few parameters; the IVI-API deals with the raw images directly. In this API, there is only one top-level function for verifications, Verify; and there is only one top-level function for identifications, Identify.

[0214] This IVI-API does not have an enrollment process. The enrollment is replaced by setting two parameters:

[0215] The image in question;

[0216] The folder of previously stored images.

[0217] This WI-API does require an image storage structure that should be followed by the applications, so the folder of previously stored images can be passed to the verification and identification functions. Both the verification path and identification path are parameters, which can be changed by the parameter writer functions. The image in question can be stored anywhere in a hard drive. The previously stored images must follow the following structure:

[0218] Verification

[0219] The previously stored images must be stored at:

[0220] verification path\ID\.

[0221] Example. Assume:

[0222] 1. The verification path (a parameter) is:

[0223] c:\Attrasoft\verification\

[0224] 2. A set of doublets is: Image imageID Gina1.jpg 12001 Gina2.jpg 12001 Tiffany1.jpg 12002 Tiffany2.jpg 12002

[0225] Then the storage structure is:

[0226] c:\Attrasoft\verification\12001\gina1.jpg

[0227] c:\Attrasoft\verification\12001\gina2.jpg

[0228] c:\Attrasoft\verification\12002\tiffany1.jpg

[0229] c:\Attrasoft\verification\12002\tiffany2.jpg

[0230] Identification

[0231] The folder of previously stored images must be stored at:

[0232] identification path\

[0233] Example. Assume:

[0234] 1. The identification path (a parameter) is:

[0235] c:\Attrasoft\identification\

[0236] 2. A set of doublets is: Image imageID Gina1.jpg 12001 Gina2.jpg 12001 Tiffany1.jpg 12002 Tiffany2.jpg 12002

[0237] If the number of images is less than 1000, then the storage structure is

[0238] c:\Attrasoft\identification\gina1.jpg

[0239] c:\Attrasoft\identification\gina2.jpg

[0240] c:\Attrasoft\identification\tiffany1.jpg

[0241] c:\Attrasoft\identification\tiffany2.jpg

[0242] If the number of images is more than 1000, then the sub-directories should be used:

[0243] c:\Attrasoft\identification\dir0000\gina1.jpg

[0244] c:\Attrasoft\identification\dir0000\gina2.jpg

[0245] c:\Attrasoft\identification\dir0000\tiffany1.jpg

[0246] c:\Attrasoft\identification\dir0000\tiffany2.jpg

[0247] Enrollment

[0248] The enrollment process builds the folder of previously stored images according to the above structure. The folder of previously stored images will be a parameter for the AVI layer, called verification directory, or identification directory or search directory. There will be a section to address the parameters later. Because the enrollment means passing parameters, the enrollment is always 100%.

[0249] 1:N Matching

[0250] The following methods (one main function and three result readers) are used to perform the Verification function:

[0251] int verify(String image, long imageID);

[0252] long getVerifyID( );

[0253] String getVerifyName( );

[0254] long getVerifyWeight( );

[0255] A typical process is:

[0256] Initialize System

[0257] Capture image

[0258] Calculate the template

[0259] Verify

[0260] However, because “Calculate the template” is not required in this UVI-API; and the system is initialized before the verification process started, the process is:

[0261] Capture

[0262] Verify

[0263] The capture( ) functions are provided in SPI, which can be accessed directly by applications. Both the image in question and the folder of previously stored images are in the hard drive. The applications then pass (String image, long imageID) to the verify( ) function.

[0264] N:N Matching

[0265] The following methods are used to perform the Identification function:

[0266] int identify(String image);

[0267] Long [] getldentifyID( );

[0268] String [] getIdentifyName( );

[0269] Long getIdentifyWeight( ).

[0270] Both the image in question and the folder of previously stored images are in the hard drive. The applications then pass (String image) to the identify( ) function.

[0271] Parameters

[0272] The set of parameters forms an array;

[0273] Void setParameter(int I, long x); // a[I]=x

[0274] Long getparameter(int I); // retune a[I]

[0275] Sample Implementation

[0276] We will present three sample implementations based on FIG. 2 (3-Layer Architecture). The first example has all 3 layers; the second example has only 1 layer; and the third example has 2 layers.

[0277] There are two CD's labeled “Document, Sample Implementation”. The disks contain only three ASCII files. Each disk in the duplicate set is identical. The contents of the CD are: File Name Type Size Date Description ABM4_9 TXT 156,256 May 16, 2002 Detailed description of ImageFinder 4.9 ABM5_0 TXT  96,515 May 16, 2002 Detailed description of PolyApplet 5.0 ABM5_1 TXT  43,019 May 16, 2002 Detailed description of TransApplet 5.1

[0278] These three files will give detailed descriptions of the three sample implementations below.

[0279] Attrasoft ImageFinder 4.9

[0280] A sample Invention Application Software is the Attrasoft ImageFinder 4.9, which has all three layers in FIG. 2. FIG. 3 shows the ImageFinder User Interface using the Present Invention. FIG. 4 shows a sample Key Input in the ImageFinder software using the Present Invention. FIG. 5 shows a sample Search Output of the Present Invention. The search output is a list of pairs. FIG. 6 shows a sample Classification output of the Present Invention. The classification output is a list of triplets.

[0281] The ASCII file, ABM4_(—)9.TXT, in the CD's labeled “Document, Sample Implementation” will give a detailed description.

[0282] In addition, two CD's, labeled “Attrasoft ImageFinder 4.9”, contain sample implementation software. The software can be installed and run to test the proposed algorithm. Note:

[0283] A. The CD's contain non-ASCII files, such as the installation file and execution files. The installation files will install the following executable files to a computer with Microsoft Windows as the operating system:

[0284] Attrasoft ImageFinder 4.9 for Windows 95/98/ME, execution files;

[0285] Attrasoft ImageFinder 4.9 for Windows 2000/XP, execution files;

[0286] Data File for running the software;

[0287] User's Guide in Microsoft Word, and

[0288] User's Guide in html format.

[0289] These five files can also be run from the CD.

[0290] B. The Operating System is Windows 95, 98, ME, 2000, and XP.

[0291] C. Each disk in the duplicate set is identical.

[0292] D. Contents of the CD. Root Directory Contents: File Name Type Size Date Description DISK1 ID 5 Jan. 05, 1990 9:31p Installation File DISK10 ID 5 Jan. 05, 1990 9:31p Installation File DISK11 ID 5 Jan. 05, 1990 9:31p Installation Fite DISK12 ID 5 Jan. 05, 1990 9:31p Installation File DISK13 ID 5 Jan. 05, 1990 9:32p Installation File DISK14 ID 5 Jan. 05, 1990 9:32p Installation File DISK2 ID 5 Jan. 05, 1990 9:32p Installation File DISK3 ID 5 Jan. 05, 1990 9:32p Installation File DISK4 ID 5 Jan. 05, 1990 9:33p Installation File DISK5 ID 5 Jan. 05, 1990 9:33p Installation File DISK6 ID 5 Jan. 05, 1990 9:33p Installation File DISK7 ID 5 Jan. 05, 1990 9:33p Installation File DISK8 ID 5 Jan. 05, 1990 9:34p Installation File DISK9 ID 5 Jan. 05, 1990 9:34p Installation File SETUP EXE 47,616 Jan. 05, 1990 9:31p Installation File SETUP INI 32 Jan. 05, 1990 9:31p Installation File SETUP INS 147,449 Jan. 05, 1990 9:31p Installation File SETUP ISS 510 Jan. 05, 1990 9:31p Installation File SETUP PKG 15,061 Jan. 05, 1990 9:31p Installation File _INST32I EX_ 306,666 Jan. 05, 1990 9:31p Installation File _ISDEL EXE 8,192 Jan. 05, 1990 9:31p Installation File _SETUP  1 721,623 Jan. 05, 1990 9:31p Installation File _SETUP 10 1,454,681 Jan. 05, 1990 9:31p Installation File _SETUP 11 1,455,574 Jan. 05, 1990 9:31p Installation File _SETUP 12 1,455,468 Jan. 05, 1990 9:31p Installation File _SETUP 13 1,454,113 Jan. 05, 1990 9:32p Installation File _SETUP 14 1,074,165 Jan. 05, 1990 9:32p Installation File _SETUP  2 1,454,796 Jan. 05, 1990 9:32p Installation File _SETUP  3 1,456,887 Jan. 05, 1990 9:32p Installation File _SETUP  4 1,455,245 Jan. 05, 1990 9:33p Installation File _SETUP  5 1,455,918 Jan. 05, 1990 9:33p Installation File _SETUP  6 1,455,206 Jan. 05, 1990 9:33p Installation File _SETUP  7 1,453,720 Jan. 05, 1990 9:33p Installation File _SETUP  8 1,455,603 Jan. 05, 1990 9:34p Installation File _SETUP  9 1,456,571 Jan. 05, 1990 9:34p Installation File _SETUP DLL 10,752 Jan. 05, 1990 9:31p Installation File _SETUP LIB 196,219 Jan. 05, 1990 9:31p Installation File ABM49 <DIR> Jun. 08, 2001 1:04p Executable File USPTO72 <DIR> Feb. 28, 2001 7:15p Data File USPTO74 <DIR> May 21, 2001 4:33p Data File

[0293] E. Interpretation of the files

[0294] Please see Appendix A for the detailed interpretation of the roles of these files. To install the software to a Personal Computer using Windows, double click the setup. exe file.

[0295] Attrasoft PolyApplet 5.0

[0296] A sample Invention Application Software is the PolyApplet 5.0, which only has the Neural Layer of this invention.

[0297] The ASCII file, ABM5_(—)0.TXT, in the CD's labeled “Document, Sample Implementation” will give a detailed description.

[0298] Attrasoft TransApplet 5.1

[0299] A sample Invention Application Software is the TransApplet 5.1, which has both Neural Layer and the Presentation Layer of this invention.

[0300] The ASCII file, ABM5_(—)1.TXT, in the CD's labeled “Document, Sample Implementation” will give a detailed description.

[0301] In addition, two CD's labeled “Attrasoft TransApplet 5.1” contain sample implementation of the software library. Note:

[0302] A. The disks contain only Non-ASCII files. The CD contains the following files:

[0303] Attrasoft TransApplet 5.1 software library for Windows 95/98/ME/2000/XP, COM/DLL file format;

[0304] Sample Implementation Code;

[0305] User's Guide in Microsoft Word, and

[0306] User's Guide in html format.

[0307] B. The Operating System is Windows 95, 98, ME, 2000, and XP.

[0308] C. Each disk in the duplicate set is identical.

[0309] D. Contents of the CD: Root Directory Contents: File Name Type Size Date Description ABM5_1 DOC 616,448 Oct. 21, 2001 11:28a User's Guide, Word CHAP3 <DIR> Oct. 19, 2001 4:31p Examples CHAP4 <DIR> Oct. 19, 2001 4:31p Examples CHAP5 <DIR> Oct. 19, 2001 4:31p Examples CHAP6 <DIR> Oct. 19, 2001 4:31p Examples CHAP7 <DIR> Oct. 19, 2001 4:32p Examples FBI <DIR> Jun. 08, 2001 1:04p Examples HELP <DIR> Oct. 19, 2001 4:40p User's Guide, Word OBLIQUE <DIR> Jun. 08. 2001 1:04p Examples README TXT    567 Oct. 20, 2001 10:51a readme.txt TRANS˜26 DLL 282,112 Oct. 21, 2001 11:00a COM DLL

[0310] E. Interpretation of the files

[0311] (E1) The file labeled “COM DLL” is the COM DLL software library file to be used by users.

[0312] (E2) The directories, labeled “Examples”, contain the examples of how to use the COM DLL.

[0313] (E3) The files, labeled “User's Guide, Word” and the directory, “User's Guide, html”, contain the User's Guide. 

What is claimed is: 1 A computer implemented process for content-based images search or retrieval with these steps: specifying sample image(s) or/and segment(s) or/and directory and/or directories; specifying training parameters; training by one click; specifying the directory or directories to be searched; specifying search parameters; searching by one click. 2 A computer implemented process of claim 1, wherein the order of steps is altered to cover all possible combinations. 3 A computer implemented process for image classification with these steps: specifying sample image(s) or/and segment(s) or/and directory and/or directories; specifying training parameters; training by one click; specifying the directory or directories to be searched; specifying search parameters; searching by one click. Repeat the above process for each class. When the all classes are covered, classify the images by one click. 4 A computer implemented process of claim 3, wherein the order of steps is altered to cover all possible combinations. 5 A computer implemented process of claim 1 (search), wherein the steps are saved in the batch code and executed by a batch command. The batch code can be entered into the system in several ways, including: a. Click a save button to save the current setting, including key, search directory, and parameters into a batch code. b. Click a file button to recall one of many batch codes saved earlier. c. Cut and paste or simply type in a batch code by keyboard. d. Obtain the code from a file. 6 A computer implemented process of claim 3 (classification), wherein the steps are saved in the batch code and executed by batch command. The batch code can be entered into the system in several ways, including: a. Click a save button to save the current setting, including key, search directory, and parameters into a batch code. b. Click a file button to recall one of many batch codes saved earlier. c. Cut and paste or simply type in a batch code by keyboard. d. Obtain the code from a file. 7 A computer implemented process of claim 1 and 3 (search and classification), further comprising the step of retraining. This allows the system to be trained by more than one image, or segment of an image, or a directory contains images. 8 A computer implemented process of claim 1 and 3 (search and classification), further comprising the step of simply mapping a part of or all of the parameters to one or two integers represented by Scrollbar(s), thus allowing the simplification of setting parameters. 9 A computer implemented process of claim 1 (search), further comprising output results being listed both in the system and in a new exiting process such as Microsoft Internet Explorer. The output web page has a list of names and weights: a. The weight of an image is related to the characteristics users are looking for (the weight). b. Click the name of each image and an image will pop up on the screen. 10 A computer implemented process of claim 3 (classification), further comprising output results being listed both in the system and in a new exiting process such as Microsoft Internet Explorer. The output web page has a list of names and weights: a. An image link for each image in the search directory; b. The classification weights of this image in each search; and c. The classification of this image as a link. 11 A computer implemented process of claim 1 and 3 (search and classification), wherein the steps of setting parameters comprises the “Area of Interest”, which specifies an image segment, which is specified by 4 numbers: the coordinates of the upper-left corner and the bottom-right corner and obtained in two clicks. 12 A computer implemented process of claim 1 and 3 (search and classification), wherein the steps of setting parameters comprises the “internal representation”, which specifies the dimensions of a pixel array used for computation, which may or not be the actual image pixel array. 13 A computer implemented process of claim 1 and 3 (search and classification), wherein the steps of setting parameters comprises the “Symmetry”, which represents similarity under certain types of changes, such as intensity, translation symmetry, Scaling, Rotation, combined rotation and scaling, or combination thereof. 14 A computer implemented process of claim 1 and 3 (search and classification), wherein the steps of setting parameters comprises the “Sensitivity”, which deals with the sample segment size, high sensitivity is for small segment(s) and low sensitivity is for large segment(s). 15 A computer implemented process of claim 1 and 3 (search and classification), wherein the steps of setting parameters comprises the “Blurring”, which measure the distortion due to data compression, translation, rotation, scaling, intensity change, and image format conversion, or combination there of. 16 A computer implemented process of claim 1 and 3 (search and classification), wherein the steps of setting parameters comprises the “Shape Cut”, which eliminates many images that have different shapes as the sample segment. 17 A computer implemented process of claim 1 and 3 (search and classification), wherein the steps of setting parameters comprise the “image types”, which specifies ABM or APN algorithm. 18 A computer implemented process of claim 1 and 3 (search and classification), wherein the parameter is provide in a file, which specify more complicated setting than the graphical user interface. For example, just search through images listed in a file. 19 A computer implemented process of claim 1 and 3 (search and classification), wherein the neural layer deploys the ABM or/and APN algorithm. 20 The ABM algorithm, including ABM learning algorithm and ABM recognition algorithm. 21 The APN algorithm, including APN learning algorithm and APN recognition algorithm. 22 A component of the ABM or APN algorithm, “Symmetry”, which is implemented by physically applying the sample image to all possible positions and train the software with all of these transformed image(s) or segment(s). 23 A component of the ABM or APN algorithm, “Sensitivity” or whatever the terminology used, which deals a particular way of limiting the relevant neural connections in a particular computation. When ABM net, x, is trained, there will be certain connections. All possible connections together form a connection space, the connection H. Deploying a distance in this connection space is an important step in the ABM or APN algorithm (See the description of Sensitivity). The present invention covers all method, combination of limiting a connection set in the connection space, especially with a distance as a parameter. 24 A component of the ABM or APN algorithm, “Blurring” or whatever the terminology used, which measures the distortion due to data compression, translation, rotation, scaling, intensity change, and image format conversion. All possible images together form a space, the image space. This method expands an image in the search directory from a single point to a set defined by a distance in the image space (See the description of Blurring). The present invention covers all method, combination of creating an image set in the image space, especially with a distance as a parameter, for the purpose of expanding the key(s). 25 A component of the ABM or APN algorithm, the “Shape Cut” or whatever the terminology used, is to eliminate many images using the concept of image space (See the description of “Shape Cut”). The present invention covers all method, combination of creating an image set in the image space, especially with a distance as a parameter, for the purpose of limiting the number of images to pass through. 26 A component of the ABM or APN learning algorithm, where the connection space is used to generate connection, rather than a process of repetitions of modifying weights directly and observing the performances. Deploying the connection space for establishing connection is a very important part of the present invention. The present invention covers all method of creating the synaptic connections directly in the connection space, especially with a distance as a parameter. 27 A component of the APN learning algorithm, which converts binary neural net to multi-valued neural net by deploying a mapping for each connection. The present invention covers all type of mapping. 28 A computer implemented process for content-based images verification, identification, retrieval, and classification with software components, which use IVI-API as an application-programming interface. 